New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: increase cache sizes #24247
mon: increase cache sizes #24247
Conversation
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an unreasonable amount of memory to expect a monitor to have, but right now they are often quite thin daemons (once you give them an SSD, anyway...) and the OSD map cache number is low because there were issues with monitors OOMing. So we're adding another ~500MB(+?) of memory needs to the monitor on reasonably-sized clusters, which isn't trivial.
Similarly we've been saying 1GB/TB on OSDs for a long time, but that messaging was quite confused for users of FileStore OSDs (I've certainly said on the list that I had no idea where it came from, since I think it was initially just made up without justification by some doc writer before we later decided it was a good idea?). Though I'm less worried about that change, assuming we don't backport it.
So generally I'm fine with changing these values, but we need at least some napkin math demonstrating they aren't real changes or else we need to message them pretty loudly.
I pushed commits that updates the hardware recommendations about RAM in the docs. I also included a release note. The hardware recommendations definitely need a refresh--probably much more than I did here. (I would prefer not to block this critical fix to our defaults with a log conversation about the hardware recs, though!) |
Metadata servers (ceph-mds) | ||
--------------------------- | ||
|
||
The manager daemon memory utilization depends on how much memory its cache is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - s/manager/metadata/
Yeah sounds reasonable, I like the doc changes. Reviewed-by: Greg Farnum gfarnum@redhat.com |
@liewegas note there’s a merge conflict now though. :( |
10 maps is too small to enable all mon sessions to keep abreast of the latest maps, especially if OSDs are down for any period of time during an upgrade. Note that this is quite a bit larger, but the memory usage of the mon will scale proportionally to the size of the cluster: 500 small osdmaps is not a significant amount of RAM, while conversely having a large cache is most important on a large cluster and those mons will generally have plenty of RAM available. Someday we should control this with a memory envelope like we do with the OSDs, but that remains future work. Signed-off-by: Sage Weil <sage@redhat.com>
For filestore OSDs, this is probably a good idea anyway, and is generally not going to be hugely impactful on the memory footprint (where users have been told to provide 1 GB RAM per 1 TB storage for a long time now). For bluestore OSDs, this value is meaningless as we're autotuning this anyway. For mons, this is a more reasonable default. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/24247/head: PendingReleaseNotes: add note about increased mon memory footprint doc/start/hardware-recommendations: refresh recommendations for RAM rocksdb: increase default cache size to 512 MB mon: mon_osd_cache_size = 500 (from 10) Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>
These two changes should mitigate the luminous->mimic upgrade disaster recently experienced by a user with a ~1000 node cluster.
I think these new defaults are reasonable, but comments welcome!
Longer term, I think we need a strategy for dynamically sizing these caches based on the size of the cluster?